What is Image Denoising? Image denoising is the process of removing noise from images to improve their quality.
Papers and Code
Sep 11, 2025
Abstract:Among generative models, diffusion models are uniquely intriguing due to the existence of a closed-form optimal minimizer of their training objective, often referred to as the optimal denoiser. However, diffusion using this optimal denoiser merely reproduces images in the training set and hence fails to capture the behavior of deep diffusion models. Recent work has attempted to characterize this gap between the optimal denoiser and deep diffusion models, proposing analytical, training-free models that can generate images that resemble those generated by a trained UNet. The best-performing method hypothesizes that shift equivariance and locality inductive biases of convolutional neural networks are the cause of the performance gap, hence incorporating these assumptions into its analytical model. In this work, we present evidence that the locality in deep diffusion models emerges as a statistical property of the image dataset, not due to the inductive bias of convolutional neural networks. Specifically, we demonstrate that an optimal parametric linear denoiser exhibits similar locality properties to the deep neural denoisers. We further show, both theoretically and experimentally, that this locality arises directly from the pixel correlations present in natural image datasets. Finally, we use these insights to craft an analytical denoiser that better matches scores predicted by a deep diffusion model than the prior expert-crafted alternative.
* 30 pages, 18 figures, 6 tables
Via

Sep 11, 2025
Abstract:We explore the connection between Plug-and-Play (PnP) methods and Denoising Diffusion Implicit Models (DDIM) for solving ill-posed inverse problems, with a focus on single-pixel imaging. We begin by identifying key distinctions between PnP and diffusion models-particularly in their denoising mechanisms and sampling procedures. By decoupling the diffusion process into three interpretable stages: denoising, data consistency enforcement, and sampling, we provide a unified framework that integrates learned priors with physical forward models in a principled manner. Building upon this insight, we propose a hybrid data-consistency module that linearly combines multiple PnP-style fidelity terms. This hybrid correction is applied directly to the denoised estimate, improving measurement consistency without disrupting the diffusion sampling trajectory. Experimental results on single-pixel imaging tasks demonstrate that our method achieves better reconstruction quality.
Via

Sep 11, 2025
Abstract:Modern smartphones are equipped with Lidar sensors providing depth-sensing capabilities. Recent works have shown that this complementary sensor allows to improve various tasks in image processing, including deblurring. However, there is a current lack of datasets with realistic blurred images and paired mobile Lidar depth maps to further study the topic. At the same time, there is also a lack of blind zero-shot methods that can deblur a real image using the depth guidance without requiring extensive training sets of paired data. In this paper, we propose an image deblurring method based on denoising diffusion models that can leverage the Lidar depth guidance and does not require training data with paired Lidar depth maps. We also present the first dataset with real blurred images with corresponding Lidar depth maps and sharp ground truth images, acquired with an Apple iPhone 15 Pro, for the purpose of studying Lidar-guided deblurring. Experimental results on this novel dataset show that Lidar guidance is effective and the proposed method outperforms state-of-the-art deblurring methods in terms of perceptual quality.
Via

Sep 11, 2025
Abstract:In real-world traffic surveillance, vehicle images captured under adverse weather, poor lighting, or high-speed motion often suffer from severe noise and blur. Such degradations significantly reduce the accuracy of license plate recognition systems, especially when the plate occupies only a small region within the full vehicle image. Restoring these degraded images a fast realtime manner is thus a crucial pre-processing step to enhance recognition performance. In this work, we propose a Vertical Residual Autoencoder (VRAE) architecture designed for the image enhancement task in traffic surveillance. The method incorporates an enhancement strategy that employs an auxiliary block, which injects input-aware features at each encoding stage to guide the representation learning process, enabling better general information preservation throughout the network compared to conventional autoencoders. Experiments on a vehicle image dataset with visible license plates demonstrate that our method consistently outperforms Autoencoder (AE), Generative Adversarial Network (GAN), and Flow-Based (FB) approaches. Compared with AE at the same depth, it improves PSNR by about 20%, reduces NMSE by around 50%, and enhances SSIM by 1%, while requiring only a marginal increase of roughly 1% in parameters.
Via

Sep 11, 2025
Abstract:As an influential information fusion and low-level vision technique, image fusion integrates complementary information from source images to yield an informative fused image. A few attempts have been made in recent years to jointly realize image fusion and super-resolution. However, in real-world applications such as military reconnaissance and long-range detection missions, the target and background structures in multimodal images are easily corrupted, with low resolution and weak semantic information, which leads to suboptimal results in current fusion techniques. In response, we propose FS-Diff, a semantic guidance and clarity-aware joint image fusion and super-resolution method. FS-Diff unifies image fusion and super-resolution as a conditional generation problem. It leverages semantic guidance from the proposed clarity sensing mechanism for adaptive low-resolution perception and cross-modal feature extraction. Specifically, we initialize the desired fused result as pure Gaussian noise and introduce the bidirectional feature Mamba to extract the global features of the multimodal images. Moreover, utilizing the source images and semantics as conditions, we implement a random iterative denoising process via a modified U-Net network. This network istrained for denoising at multiple noise levels to produce high-resolution fusion results with cross-modal features and abundant semantic information. We also construct a powerful aerial view multiscene (AVMS) benchmark covering 600 pairs of images. Extensive joint image fusion and super-resolution experiments on six public and our AVMS datasets demonstrated that FS-Diff outperforms the state-of-the-art methods at multiple magnifications and can recover richer details and semantics in the fused images. The code is available at https://github.com/XylonXu01/FS-Diff.
* Information Fusion, 2025, 121: 103146
Via

Sep 10, 2025
Abstract:Multispectral computed tomography (CT) enables advanced material characterization by acquiring energy-resolved projection data. However, since the incoming X-ray flux is be distributed across multiple narrow energy bins, the photon count per bin is greatly reduced compared to standard energy-integrated imaging. This inevitably introduces substantial noise, which can either prolong acquisition times and make scan durations infeasible or degrade image quality with strong noise artifacts. To address this challenge, we present a dedicated neural network-based denoising approach tailored for multispectral CT projections acquired at the BM18 beamline of the ESRF. The method exploits redundancies across angular, spatial, and spectral domains through specialized sub-networks combined via stacked generalization and an attention mechanism. Non-local similarities in the angular-spatial domain are leveraged alongside correlations between adjacent energy bands in the spectral domain, enabling robust noise suppression while preserving fine structural details. Training was performed exclusively on simulated data replicating the physical and noise characteristics of the BM18 setup, with validation conducted on CT scans of custom-designed phantoms containing both high-Z and low-Z materials. The denoised projections and reconstructions demonstrate substantial improvements in image quality compared to classical denoising methods and baseline CNN models. Quantitative evaluations confirm that the proposed method achieves superior performance across a broad spectral range, generalizing effectively to real-world experimental data while significantly reducing noise without compromising structural fidelity.
Via

Sep 09, 2025
Abstract:This paper presents a novel, mathematically rigorous framework for autoencoder-type deep neural networks that combines optimal control theory and low-rank tensor methods to yield memory-efficient training and automated architecture discovery. The learning task is formulated as an optimization problem constrained by differential equations representing the encoder and decoder components of the network and the corresponding optimality conditions are derived via a Lagrangian approach. Efficient memory compression is enabled by approximating differential equation solutions on low-rank tensor manifolds using an adaptive explicit integration scheme. These concepts are combined to form OCTANE (Optimal Control for Tensor-based Autoencoder Network Emergence) -- a unified training framework that yields compact autoencoder architectures, reduces memory usage, and enables effective learning, even with limited training data. The framework's utility is illustrated with application to image denoising and deblurring tasks and recommendations regarding governing hyperparameters are provided.
Via

Sep 11, 2025
Abstract:Predicting the spatio-temporal progression of brain tumors is essential for guiding clinical decisions in neuro-oncology. We propose a hybrid mechanistic learning framework that combines a mathematical tumor growth model with a guided denoising diffusion implicit model (DDIM) to synthesize anatomically feasible future MRIs from preceding scans. The mechanistic model, formulated as a system of ordinary differential equations, captures temporal tumor dynamics including radiotherapy effects and estimates future tumor burden. These estimates condition a gradient-guided DDIM, enabling image synthesis that aligns with both predicted growth and patient anatomy. We train our model on the BraTS adult and pediatric glioma datasets and evaluate on 60 axial slices of in-house longitudinal pediatric diffuse midline glioma (DMG) cases. Our framework generates realistic follow-up scans based on spatial similarity metrics. It also introduces tumor growth probability maps, which capture both clinically relevant extent and directionality of tumor growth as shown by 95th percentile Hausdorff Distance. The method enables biologically informed image generation in data-limited scenarios, offering generative-space-time predictions that account for mechanistic priors.
* 13 pages, 4 figures
Via

Sep 09, 2025
Abstract:Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient computation for reward scoring, which is computationally expensive, thus restricting optimization to only a few diffusion steps; (2) they often need continuous offline adaptation of reward models in order to achieve desired aesthetic quality, such as photorealism or precise lighting effects. To address the limitation of multistep denoising, we propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via interpolation, leveraging the equation that diffusion states are interpolations between noise and target images, which effectively avoids over-optimization in late timesteps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals. This approach enables online adjustment of rewards in response to positive and negative prompt augmentation, thereby reducing the reliance on offline reward fine-tuning. By fine-tuning the FLUX model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.
* 15 pages
Via

Sep 09, 2025
Abstract:Joint reconstruction of human-object interaction marks a significant milestone in comprehending the intricate interrelations between humans and their surrounding environment. Nevertheless, previous optimization methods often struggle to achieve physically plausible reconstruction results due to the lack of prior knowledge about human-object interactions. In this paper, we introduce ScoreHOI, an effective diffusion-based optimizer that introduces diffusion priors for the precise recovery of human-object interactions. By harnessing the controllability within score-guided sampling, the diffusion model can reconstruct a conditional distribution of human and object pose given the image observation and object feature. During inference, the ScoreHOI effectively improves the reconstruction results by guiding the denoising process with specific physical constraints. Furthermore, we propose a contact-driven iterative refinement approach to enhance the contact plausibility and improve the reconstruction accuracy. Extensive evaluations on standard benchmarks demonstrate ScoreHOI's superior performance over state-of-the-art methods, highlighting its ability to achieve a precise and robust improvement in joint human-object interaction reconstruction.
* Accepted by ICCV 2025
Via
